Evaluation of the Shapd2 Algorithm Efficiency in Plagiarism Detection Task Using Pan Plagiarism Corpus
نویسنده
چکیده
This work presents results of the ongoing novel research in the area of natural language processing focusing on plagiarism detection, semantic networks and semantic compression. The results demonstrate that the semantic compression is a valuable addition to the existing methods used in plagiary detection. The application of the semantic compression boosts the efficiency of Sentence Hashing Algorithm for Plagiarism Detection 2 (SHAPD2) and authors’ implementation of the w shingling algorithm. Experiments were performed on Clough & Stephenson corpus as well as an available PAN–PC-10 plagiarism corpus used to evaluate plagiarism detection methods, so the results can be compared with other research teams.
منابع مشابه
Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015
The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...
متن کاملDeveloping Bilingual Plagiarism Detection Corpus Using Sentence Aligned Parallel Corpus: Notebook for PAN at CLEF 2015
Plagiarism detection is the process of locating text reuse within a suspicious document. The plagiarism detection corpora are used for evaluating plagiarism detection systems. In this paper, we present a bilingual PersianEnglish plagiarism detection corpus. We provide our corpus for the task of text alignment corpus construction in the PAN 2015 competition. Our approach is based on parallel cor...
متن کاملOverview of the 1st International Competition on Plagiarism Detection
The 1st International Competition on Plagiarism Detection, held in conjunction with the 3rd PAN workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse, brought together researchers from many disciplines around the exciting retrieval task of automatic plagiarism detection. The competition was divided into the subtasks external plagiarism detection and intrinsic plagiarism dete...
متن کاملOverview of the AraPlagDet PAN@FIRE2015 Shared Task on Arabic Plagiarism Detection
AraPlagDet is the first shared task that addresses the evaluation of plagiarism detection methods for Arabic texts. It has two subtasks, namely external plagiarism detection and intrinsic plagiarism detection. A total of 8 runs have been submitted and tested on the standardized corpora developed for the track. This overview paper describes these evaluation corpora, discusses the participants’ m...
متن کاملPAN 2015 Shared Task on Plagiarism Detection: Evaluation of Corpora for Text Alignment: Notebook for PAN at CLEF 2015
In this paper we describe and evaluate the corpora submitted to the PAN 2015 shared task on plagiarism detection for text alignment. We received monoand cross-language corpora in the following languages (pairs): English, Persian, Chinese, and Urdu-English, English-Persian. We present an independent section for each submitted corpus including statistics, discussion of the obfuscation techniques ...
متن کامل